AN RNN-based compensation method for Mandarin telephone speech recognition
نویسندگان
چکیده
In this paper, a novel architecture, which integrates the recurrent neural network (RNN) based compensation process and the hidden Markov model (HMM) based speech recognition process into a unified framework, is proposed. The RNN is employed to estimate the additive bias, which represents the telephone channel effect, in the cepstral domain. Compensation of telephone channel effects is implemented by subtracting the additive bias from the cepstral coefficients of the input utterance. The integrated recognition system is trained based upon MCE/GPD (minimum classification error/generalized probabilistic descent) method with an objective function that is designed to minimize recognition error rates. Experimental results for speaker-independent Mandarin polysyllabic word recognition show an error rate reduction of 21.5% compared to the baseline system.
منابع مشابه
A robust RNN-based pre-classification for noisy Mandarin speech recognition
This paper addressed the problem of speech signal preclassification for robust noisy speech recognition. A novel RNN-based pre-classification scheme for noisy Mandarin speech recognition is proposed. The RNN, which is trained to be insensitive to noise-level variation, is employed to classify each input frame into the three broad classes of initial, final and pure-noise. An on-line noise tracki...
متن کاملMandarin telephone speech recognition for automatic telephone number directory service
This paper discusses an HMM-based Mandarin telephone speech recognition method for implementing a prototype system of automatic telephone number directory service. It adopted the GPD/MCE training algorithm to train the HMM models for 100 final-dependent syllable initials and 40 syllable finals. The SBR method was used to compensate the speaker and channel effects. Besides, an RNN-based pre-clas...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملAn RNN-based preclassification method for fast continuous Mandarin speech recognition
A novel recurrent neural network-based (RNN-based) frontend preclassification scheme for fast continuous Mandarin speech recognition is proposed in this paper. First, an RNN is employed to discriminate each input frame for the three broad classes of initial, final, and silence. A finite state machine (FSM) is then used to classify the input frame into four states including three stable states o...
متن کاملRobust SBR method for adverse Mandarin speech recognition - Electronics Letters
10 RRSBR An RNN-based robust signal bias removal (RRSBR) method is proposed for improving both the recognition performance and the computational efficiency of the SBR method for adverse Mandarin speech recognition. It differs from the SBR method in using three broadclass sub-codebooks to encode the feature vector of each frame and combining the three encoding residuals to form the frame-level s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998